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ABSTRACT 

The Reverse Zeldovich Approximation (RZA) is a reconstruction method which allows 
to estimate the cosmic displacement field from galaxy peculiar velocity data and to constrain 
initial conditions for cosmological simulations of the Local Universe. In this paper, we inves- 
tigate the effect of different observational errors on the reconstruction quality of this method. 
For this, we build a set of mock catalogues from a cosmological simulation, varying different 
error sources like the galaxy distance measurement error (0 - 20%), the sparseness of the data 
points, and the maximum catalogue radius (3000 - 6000 km/s). We perform the RZA recon- 
struction of the initial conditions on these mock catalogues and compare with the actual initial 
conditions of the simulation. We also investigate the impact of the fact that only the radial part 
of the peculiar velocity is observationally accessible. We find that the sparseness of a dataset 
has the highest detrimental effect on RZA reconstruction quality. Observational distance er- 
rors also have a significant influence, but it is possible to compensate this relatively well with 
Wiener Filter reconstruction. We also investigate the effect of different object selection crite- 
ria and find that distance catalogues distributed randomly and homogeneously across the sky 
(such as spiral galaxies selected for the Tully-Fisher method) allow for a higher reconstruction 
quality than if when data is preferentially drawn from massive objects or dense environments 
(such as elliptical galaxies). We find that the error of estimating the initial conditions with 
RZA is always dominated by the inherent non-linearity of data observed at z = rather than 
by the combined effect of the observational errors. Even an extremely sparse dataset with high 
observational errors still leads to a good reconstruction of the initial conditions on a scale of 
» 5 Mpc/h. 

Key words: cosmology: theory - dark matter - large-scale structure of Universe - galaxies: 
haloes - methods: numerical 



1 INTRODUCTION 

During the last years there has been an impressive advancement in 
the field of modelling the formation and evolution of the large-scale 
structure (LSS) of the Universe. In large part, our understanding of 
this highly non-linear process is enabled by conducting cosmolog- 
ical A'-body simulations, which usually model the time evolution 
of a finite subvolume of the Universe from the primordial density 
perturbations through the LSS formation until the present epoch. 
These methods complement observations of our Universe at large 
scales, such as the properties and distributions of galaxies. Here, 
the Local Universe, our cosmological neighbourhood, is the best- 
studied region. 

A very attractive approach to link together these observed 
local structures and the complementary simulation techniques is 
the constrained realizations (CR) method. With the CR algorithm 



l lHoffman&Ribaklll99lh . it is possible to constrain the initial con- 
ditions (ICs) of cosmological simulations using as input observa- 
tional data of the Local Universe. The resulting constrained simu- 
lations are able to reproduce the major objects of the Local Uni- 
verse: the Local Supercluster (LSC) with the Virgo cluster, the 
Great Attractor (GA), the Local Void, and the Coma and Perseus- 
Pisces clusters. The CLUES projecQ is an international collabo- 
ration of theoretical and observational cosmologists with the goal 
of producing such sim ulations with the highest accuracy possible 
jGottloberetaLlbOlOl) . 

A crucial part of research within the CLUES project and 
the motivation for this work is the question of how constrained 
ICs are best generated. Initially, galaxy redshift catalogues were 
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used as the input data for constrained simulation s ( Kolatt et al] 
ll996l : lBistolas & Hoffmanll998l : lMathis et al.l2002l : lLavauxl201(3h . 

However, galaxy peculiar velocity measurements provide a valu- 
able alternative to redshift data for gen erating constrained ICs 
dKlvpin et alll2003l : [Oottlober et alJIlOlOl) . The radial component 
of galaxy peculiar velocities can be derived from their redshift and 
an independent measurement of their luminosity or distance. Pe- 
culiar velocities provide a direct tracer of the total mass distribu- 
tion (without galaxy bias), and the associated distances provide 
galaxy positions in real space (without redshift distortions). Fur- 
ther, galaxy peculiar velocities are strongly correlated over large 
distances, and therefore allow for a reconstruction of the underlying 
field over large distances and despite the sparse and inhomogeneous 
sampling. For the same reason, they are less sensible to nonlinear 
effects of structure formation, which become increasingly impor- 
tant at smaller scales. The theoretical framework necessary to use 
radial peculia r veloc ities as constraints for ICs was developed by 
IZaroubi et al.l l ll999h . combining the theory of Gaussian random 
fields and their reconstruction from sparse and noisy data sets us- 
ing a Bayesian approach with the CR algorithm. 

With a measurement of the distance r to a galaxy with an es- 
timated measurement error Ar ,and the observed redshift velocity 

the radial peculiar velocity y!"^^ and its measurement error Ay|'°'' 
are obtained via 



■r-Hr, 



-Ar ■ Ho 



(1) 

(2) 



Several methods exist to measure the absolute magnitude and lumi- 
nosity of galaxies. They all entail observational errors and limita- 
tions to a varying degree. The typical observational distance error 
Ar is in the range 5 - 20%, limiting the distance out to which useful 

peculiar velocity datapoints ca n be obtained. 

The Tully-Fisher relation jTullv & Fisheij[l977h is an empiri- 
cal relationship between the luminosity of a spiral galaxy and the 
amplitude of its gas rotation speed. This well-established method 
can provide distances with decent accuracy and a high data den- 
sity over an appropriately large volume. The relation does not ap- 
ply to elliptical galaxies, since they are in general not rotation- 
ally supported and c ontain few gas. In this case one can use the 
fundamental plane jFaber & JacksonI 1 19761 : IPiorgovski & DavisI 
Il987l : IColless et al.l l200lh . which establishes a relationship be- 
tween the luminosity, the central stellar velocity, and the effec- 
tive radius of the galaxy. An alternative is th e surface brightness 
fluctuation (SBF) method jTonrv et al.l l200ll) . However, he dis- 
advantage of elliptical galaxies is that they are preferentially lo- 
cated in high-density re gions (morphology-density relation, e.g. 
Ivan der Wei et al. I I2OIOI) which do not sample the large scale 
galaxy flows. Other galaxy distance meas urement methods in- 
clude the Cepheid period-lum inosity relation jFreedman & Madord 
Il99d : iFreedman et al.l 200lh. and the tip of the red giant br anch 
(TRGB) method fearachentsev et alJl2C)04l : iRizzi et alj|2007h . al- 
though these methods sulfer from limited reaches out to about only 
10-15 Mpc. Data out to very far distances and independent from 
the galaxy types can be obtained from observations o f type la su- 
pemovae serving as standard candles djha et al.ll2007l) . While this 
method is fairly accurate, it rests on serendipity and thus can pro- 
vide only very sparse data samples. The Tully-Fisher method of 
measuring distances to spiral galaxies is the only one that currently 
combines all necessary assets: probing the space regions where co- 
herent cosmic flows prevail and obtaining an adequate sampling of 
the Local Universe volume that would be suitable for a reconstruc- 



tion of the underlying field and eventually the cosmological initial 
conditions. 

If one wants to use this data as constraints for the cosmological 
ICs, it is important to understand the effect of the different observa- 
tional errors, such as the knowledge of only one of the three veloc- 
ity components, the limited distance out to which the data sampling 
extends, the distance errors, sparseness, and the sampling bias to- 
wards high-density regions. The cuiTent dataset of peculiar veloc- 
ities used by the CLUES pr oject is the Cosmi cflows-I catalogue. 
This data was assembled bv lTullv et all j2008l) and currently con- 
tains distances to 1797 galaxies in 742 groups, composed of differ- 
ent subsets obtained with different distance measurement methods 
and providing a fairly complete sampling of the sky within 3000 
km/s. The ongoing observational work in the Cosmicflows program 
is currently directed towards preparing a muc h deeper and larger 
sample of peculiar velocities ( ICourtoisll20TTallbt ICourtois & Tullvl 
2012: Tull v&Courtoisll2012h . The upcoming Cosmicflows-2 cat- 
alogue will contain » 7000 distance measurements out to 6000 
km/s with distance errors as low as 2%, and eventually reach out 
to 15 000 km/s in the near future, exceeding all presently available 
data in both data volume and precision. It is of high importance to 
understand how to optimally use this improved data for constrained 
simulations, in what ways the quality of the simulations may be 
enhanced with the new data, and how to improve the constrained 
simulations method itself to optimally utilize the additional infor- 
mation provided by the datasets. 

This work is the second part of a series of papers devoted to 
this problem. In the first paper (Doumler et al. 2012, from here 
on Paper I) we presented the Reverse Zeldovich Approximation 
(RZA) method, which we use to generate constrained initial condi- 
tions from peculiar velocity data. As we showed in Paper I, RZA 
is a sig nificant improvement over the original methodology d evel- 
oped bvlHoffman & Ribi^ l ll99lh . lZaroubi et al. I i ll 99511 19991) and 
iKlvpinetalJ i2003l) . We refer the reader to Paper I for a thorough 
introduction to our method. Here, we investigate the effect of ob- 
servational errors on the RZA reconstruction quality. For this, we 
use mock data drawn from a cosmological simulation. 

This paper is organized as follows. In Section|2] we briefly re- 
view the method we use to reconstruct the initial conditions from 
the data and present the set of mock catalogues that we use for 
our tests. In Section|3] we perform the RZA reconstruction of ICs 
on the mock catalogues and analyze the reconstruction quality de- 
pending on different observational errors and biases. In Section |4] 
we summarize and discuss the obtained results. 



2 METHOD AND TEST DATA 
2.1 RZA reconstruction 

The RZA is a Lagrangian reconstruction method, which applies 
the Zeldovich approximation backwards in time to peculiar veloc- 
ity datapoints. It provides an estimator for the cosmic displacement 
field and the initial position of the observed object's progenitor in 
the linear regime at some early redshift Zinit where we want to con- 
struct constrained initial conditions. The method is described in de- 
tail in Paper I: we give only a brief summary here. 

Given a set of peculiar velocities v^'"' with observational er- 
rors Av^^, we first apply the Wiener Filter (WF) to the data 
l lZaroubietal]|l999l) . With the WF, we can filter out the noise 
from observational errors and obtain an estimate v^^ of the full 
three-dimensional velocity vector v. Then, we estimate the cosmic 
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displacement field ij/ by extrapolating the linear-theory equation 
I) = dfip to the current time of observation, z = 0, and obtain the 
RZA estimate 



^RZA ^ i:_ 



(3) 



We then continue with the Zeldovich ap proximation (ZA, 
IZeldoviclj[l970l : IShandarin & Zeldovichl|l989l) , which relates the 
Lagrangian position g of a datapoint with the Eulerian position x(z) 
at redshift z with the first-ordei[| Lagrangian perturbation theory 
(LPT) equation x(z) = qix) + t/f(z)- Since we choose Zi„it such that 
the perturbations are very small, we approximate the initial position 
JCini, of the datapoint at Zmn with q. We then extrapolate the ZA to 
z = and apply it in time-reversed direction to obtain an estimate 
of Xinii for each datapoint, 



(4) 



where r is the observed position at z = 0. This position can then 
be used to place constraints on the initial conditions and to run 
constrained simulations; the latter procedure will be analyzed in an 
upcoming paper, which will be Paper III in the series. In this work, 
we concentrate on the question how well we can recover x^^n for 
our data. A useful quantity in this context is the RZA error d^'^, 
which is the distance between the estimated and the actual initial 
position, 

rf^^ = |x,„,-x!^,^| . (5) 

In our test setup, we can easily compute tfi'^'^, since the mock data 
is drawn from a cosmological simulation of which the initial condi- 
tions are known. See Paper I for details on how this is accomplished 
in practice. 



2.2 Building tlie mock catalogues 

The procedure of generating realistic mock galaxy radial peculiar 
velocity catalogues was described in detail in Paper I. Here, we give 

only a brief summary. 

We use the BOX160 simulation jCuesta et a"l]|20Ill) . a con- 
strained simulation of the Local Universe in a volume of boxsize 
160 Mpc/h, as the test universe. We build a dark matter halo cat- 
alogu e at z = with the halo finder AHF l lKnollmann & Knebd 
l2009l) . assume that each galaxy sits at the centre of a dark mat- 
ter halo, and take the peculiar velocity of the halo as a proxy for 
the galaxy peculiar velocity. We consider only its radial compo- 
nent relative to an arbitrarily chosen mock observer, and add obser- 
vational errors by assuming Gaussian-distributed relative distance 
errors 6r with some fixed standard deviation ((5r)n„s. We further 
have to compensate for the fact that Lagrangian perturbation the- 
ory breaks down on small scales where shell crossing occurs. In 
particular, it does not account for the dynamics within virial haloes. 
To address that limitation, we are using here only 'parent' haloes 
as tracers of the large-scale velocity field and ignore substructures. 
'Parent' haloes are defined here as virial haloes that are not con- 
tained, fully or partially, within more massive haloes. 



^ As we saw in Paper I, the first-order LPT is already capable of generating 
a very good estimate of i/r from v, while being completely local and thus 
applicable to a sparse set of discrete data points. Higher-order LPT would 
unavoidably break this locality and require an integral of the full field over 
the whole volume of interest. 



2.3 The mock catalogue set 

The set of mock catalogues presented here is an expansion of the 
data that we introduced in Paper I. From the BOX 160 AHF halo 
catalogue, we created a total of 20 different mock peculiar veloc- 
ity catalogues in order to test how the observational distance error, 
the amount and distribution of data points, and the size of the ob- 
servational volume are going to affect RZA reconstruction. Table 
[T] summarises the basic parameters of all mocks used in this paper. 
Each of the mocks is referred to by a name encoding its proper- 
ties. The first letter characterises the method of halo selection (A - 
E: by mass cut; L,I,R by other criteria); the next two digits show 
the radius of the observational volume Ry„.,^ in Mpc/h; and the last 
two digits are the rms distance error (6r)y„,^ in percent. The last two 
mock catalogues do not feature distance errors but instead contain 
3D peculiar velocity data, which will be discussed below. In this 
case the last two digits are 3D. 

The "fiducial" catalogue is the C30_I0, which we consider a 
"typical" sparse peculiar velocity dataset. We take the procedure 
of considering only main haloes as a proxy for the "grouping" 
performed on observational data. The C30_I0 contains all main 
haloes above a mass cut M,^i„ = IO'^Mq/Zi within i?,nax = 30 
Mpc//i, yielding 588 radial velocity datapoints. This choice gives 
the C30_10 similar properties to the grouped Cosmicflows-1 cat- 
alogue (see Paper I for details). The fiducial choice of 10% rms 
distance error is also similar to the observational data: while the 
median rms distance error is somewhat higher at 13% on the in- 
dividual galaxies in Cosmicflows-I, this error reduces when the 
galaxies are arranged in groups. The C30_10 is interesting because 
even if the data are improving in terms of the number of individ- 
ual galaxy distances, the number of galaxy groups in a radius of 30 
Mpc/A is probably not going to vary by much, nor is the accuracy 
on the most nearby distances. 

With the C30_I0 mock as the starting point, we vary the rms 
distance error in five steps between none and 20%, yielding the 
mocks C30_00 through C30_20. We also vary the mass cut from 
Mmin = W-'^Mq/Ii to the minimum of 10"^Mo//j in five steps, 
yielding the mocks A30_10 through E30_I0. A30.I0, the sparsest 
sample, has the fewest data points of all mocks with only 282 radial 
velocities. We also construct mocks with larger observational vol- 
umes around tmw, varying R,r,ax from 30 to 60 Mpc/h in four steps, 
for two different mass cuts, yielding the mocks C30_I0 through 
C60.10 and E30.10 through E60.10. Although the RZA method 
itself places no restrictions on the allowed size of the data volume, 
we do not consider ^^ax > 60 Mpc/h here in order to avoid prob- 
lems with the periodic boundary conditions of the box. The E60_I0 
mock has the most data points with 7637 radial velocities. We esti- 
mate that this mock is comparable to the upcoming Cosmicflows-2 
catalogue in terms of data quality. 

Next, we want to explore how other halo selection criteria 
rather than a mass cut will afi^ect the reconstruction. Regarding 
again the C30_10 mock as a "fiducial" one, we fix the amount of 
data points constant at 588, as well as the distance error and the 
data volume. Then, for the L30_10 mock, we take the 588 next most 
massive points after the ones in C30_I0, so that they all have a mass 
below 10" 'Mo/A, to check the reconstruction quality if only less 
massive objects are considered. For the I30_10 mock, we consider 
the 588 most isolated objects leading to a yet different sampling of 
the same volume. We define the isolation radius of a halo as the 
distance to the next more massive halo, or in other words, the radius 
within which a halo is the most massive. We then choose the 588 
objects with the largest i?,s„ within 30 Mpc/h, which corresponds to 
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Table 1. Overview over the mock catalogues extracted from the BOX160 simulation that were used for the reconstructions. From left to right: an abbreviation 
used in this chapter to refer to each mock and the reconstraction computed from it; the physical quantity constrained by each data point (either only the radial 
component Vr or all three cartesian components of the halo peculiar velocity); the radius of the spherical data zone in JVIpc/ft; the mock rms distance error that 
was added; the mass limit above which haloes are selected; the selection method; the total number M of constraints (for this is equal to the number of data 
points); and the ctnl parameter that was used for each Wiener Filter reconstruction to enforce ;f^/dof = 1. 



all haloes within 30 Mpc/h with Rrm > 2.1 Mpc//j. In the last vari- 
ation of data selection, we randomly pick 588 points from all main 
haloes in the data volume, regardless of their mass or other proper- 
ties, yielding the R30_10 mock. Randomly picking haloes mimics 
the observational data of spiral galaxy peculiar velocities obtained 
with the TuUy-Fisher method, which are not located at the highest 
density peaks of the galaxy distribution (as the haloes selected by 
mass), but are selected on the random basis of their inclination on 
the sky being grater than 45 degrees. 

Finally, we want to quantify by how much the reconstruction 
quality is degraded by the fact that only the radial component Vr 
is observable, rather than the full three-dimensional velocity vector 
D. This is interesting, since in the future the transverse peculiar ve- 
locities of galaxies may become accessible to observations as well 
jNusser et al .120 12h . We construct the mock C30_3D, which has its 
data points at the same positions as the C30_00 mock, but for each 
object it lists all three components v^j, v- of the velocity instead 

of Vr- 

Of course, there are more observational features that could be 
incorporated in the mocks. For example, one could add a Zone of 
Avoidance (ZoA). However, it has been already established that the 
Wiener Filter mean field successfully extrapolates into such unsam- 
pled regions and handles datasets w ell that are sparse in an inho- 
mogeneous and/or anisotropic way jCourtois et aLll2012h . We al- 
ready exploit this behaviour by using sparse mock catalogues with 
a very limited observational volume and huge gaps in the data in 
underdense regions; adding an additional gap will not fundamen- 
tally change the situation. We confirmed this by performing tests 
with mocks featuring a ZoA of ±11°, which produced practically 
the same result. Additionally, it would be possible to incorporate 



the observational bias due apparent magnitude selection effect into 
the mock data, or modelling the different galaxy types and mea- 
surement methods in more detail. For this, it would be necessary 
to populate the haloes with galaxies of different morphologies and 
luminosities by setting up a full semianalytic model on top of the A'- 
body simulation. This would make the situation unnecessary com- 
plex without additional insight into the validity of the RZA and our 
method of generating constrained initial conditions. We therefore 
defer such in-detail studies to future work. 



3 ANALYSIS AND RESULTS 
3.1 Reconstruction quality 

The method we use rests on the assumption that the peculiar veloc- 
ity field of the Local Universe, observed through peculiar velocities 
of galaxies, does not deviate too much from the linear velocity field 
of the ICs at some early redshift z\mt, so that we can directly use the 
values of the radial peculiar velocities ul""^ as constraints c,. This 
approach differs from the WF reconstruction from such data by 
the requirement that the result must be a linear Gaussian random 
field suitable for initial conditions. This is possible, because the 
peculiar velocities of dark matter haloes at z=0 do not deviate too 
much from the linear Gaussian distribiution at initial redshift Zinit 
(see Figure[T). One obvious source of incompatible non-linearities 
in the data are the virial motions of galaxies gravitationally bound 
to larger objects such as galaxy clusters. This can be overcome 
by an appropriate grouping of the data points, which effectively 
linearises the data. Virial motions are, of course, not the only 
source of non-lineaiity. Another example of non-linearities in 
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Figure 1. Distribution function of one velocity field component (vx) in 
BOX160 at the initial conditions z = 30 and from the simulation snapshot 
at z = (normalised to the same growth rate). The blue points show the dis- 
tribution function of dark matter halo velocities at z = 0. The green curve 
shows a theoretical Gaussian distribution, with the mean and standard devi- 
ation adjusted to < tij > and < >, respectively. 



the obsei^able peculiar velocity field at z = is the general 
enhancement of halo peculiar velocities due to local overdensities. 
Additionally, with the haloes we are sampling the peculiar velocity 
field at the positions of th e density peaks, which adds another bias 
(see lBardeen et al.l ( Il986l) ). The combination of these effects leads 
to the non-linear tails of enhanced halo velociti es (blue points) at 
the hi gh-velocity ends visible in Figure 1. See ISheth & Diaferid 
( l200lh and H ainana et ah C2003i) for a more detailed discussion of 
these effects. 

However, the non-linearities can be compensated for in a sta- 
tistical sense. Considering the non-linear effects as a form of statis- 
tical scatter, one can add a new non -linearity term to the WF a uto- 
correlation matrix of the data: ctnl dSistolas &Hoffmanlll998h : As 
explained in Paper 1 (Doumler et al. 2012) of this serie, the value 
of (Tnl is chosen when^^ = 1.0. For observational radial peculiar 
velocity data, we find a typical value of ctnl ~ 200 km/s. The ini- 
tial conditions are then created as follows. First one constructs and 
inverts the autocorrelation matrix of the data. Then an appropriate 
boxsize must be chosen such that the data zone lies well within the 
computational volume. Then, the WF/CR operator can be evaluated 
on this volume, leading to a linear density field This field 

can then be scaled with the growth factor to the desired starting 
redshift Zinit of the simulation (|6]l and used to set up A'-body ICs. 



6(x,t) = D(t) 5o(x) 



(6) 



In the linear approximation the initial shape 6q{x) of the overden- 
sity distribution remains fixed, and its amplitude scales with time 
proportional to the factor D{t) 

After reconstruction of the three-dimensional peculiar veloc- 
ity values v^^ with the Wiener Filter, we obtain the reconstructed 
cosmological displacements ^^'^ of the datapoints with Eq. [3] 
We compare the displacements ij/ of all main haloes within the 
^max = 30 Mpc/h mock volume with the values of the reconstructed 
displacement field i^*'' at the positions of those haloes. Such a scat- 
ter plot is given in Figure|2]for the C30_10 reconstruction. Here, the 
average rms error per component is 4.17 Mpc//?, the slope is 0.64, 
and the correlation factor is 0.82. The correlation is significantly 
poorer than the one between v*'^ and the true v, since the RZA er- 
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Figure 2. Comparison of WF reconstructed vs. actual displacement ifi for 
all haloes within 30 Mpc//i of the observer at the discrete positions of those 
haloes, using the C30_10 mock for the reconstruction. The solid line shows 
a linear regression fit i^^'' = yS ■ i/'""" + £; the dashed line would be the ideal 
result 1^^'' = 1^°"'; where (' e {x, y, z\ are the three cartesian components x 
(red), y (black), z (blue). 



ror (the intrinsic scatter between ifr and v due to deviations from 
first-order LPT) is added on top of the error from the imperfect WF 
reconstruction. Still, we obtain a reasonable correlation. 

We now perform the same procedure with all 20 mocks. We 
perform the Wiener Filter reconstuctions with our ICeCoRe code 
(see Paper I), after determining the appropriate "non-linearity reg- 
ularization" parameter ctnl (cf. Table [TJ. Considering only the re- 
sult inside the 30 Mpc//i sphere, we then compare the resulting t//^^ 
values with those of the true t/f from the original simulation. This is 
fitted with a linear regression line as in Figure |2] However it would 
be tedious to handle such a huge amount of scatter plots. To present 
the results in a more compact way, we concatenate all three carte- 
sian components for the linear regression and consider the resulting 
slope and rms error per component averaged over all three compo- 
nents. Figure|3]shows the slope (black) and rms error (blue) for the 
reconstructed halo displacements i/i"^^ vs. the true t//. 

The different panels in Figure |3] divide the mock reconstruc- 
tions into groups by the different mock observational parameters 
that are varied. The roman numerals correspond to the mock groups 
in Table [T] Panel I shows the dependance of the reconstruction 
quality on the rms distance error; panel II varies the mass cut (and 
therefore the amount and density of data points) for the halo mass- 
selected mocks; panel III shows the effect of increasing the data 
volume beyond the default iJ^ax = 30 Mpc//j for the default mass 
cut at 11.9; panel IV repeats the same for lower mass cut mocks at 
1 1.5; panel V shows mocks with different selection methods while 
keeping the number of datapoints constant; finally, panel VI com- 
pares radial with three-dimensional input data. 

In general, the slope /3 is a measure for how well the under- 
lying field is constrained by the data. Ideally it should be I; the 
WF introduces a filtering bias due to its conservative nature, and 
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Figure 3. Slope /3 (black; left scale) and rms error (blue; right scale in Mpc//i) for a linear regression comparison of the reconstracted displacement field tji^ 
with the original ifi within 30 Mpc//i of the observer. 



reduces y6 to a value < 1. The rms error is a measure for how well 
the WF result reproduces the "true" solution for the halo displace- 
ments. 

We have to distinguish two different scenarios. Generally, if 
D*^ is compared to the true « at z = 0, the reconstruction can be- 
come almost arbitrarily accurate if we sufficiently increase the data 
quality (i t is well established that the WF is a very good filter in this 
case; see lZaroubi et al ] | 19991) . For RZA however, we want to esti- 
mate the halo displacements in order to apply the RZA reconstruc- 
tion. The disagreement between tli^^ and however is dominated 
by the RZA error rather than by the WF reconstruction quality; 
varying the mock properties has a lesser effect. There is a "wall" 
at around 3.5 Mpc/A rms error per component that cannot be pene- 
trated even with the best-quality mocks. This is the scale at which, 
averaged over the mock volume, the ip and v fields themselves dis- 
agree, because the quasi-linear assumption is not valid on these 
scales. Of course, this disagreement varies from region to region 
and highly depends on the local density (amongst other quantities), 
as shown in Paper I. 



3.1.1 Distance errors and data sparseness 

As expected, the distance errors and the mass cut both have a sig- 
nificant influence on the quality of the reconstruction (groups I and 
II). The latter seems to be more important; starting from the C30_10 
mock, a higher improvement is obtained when the number of data 
points is increased than if the distance errors are decreased. This 
is interesting when considering the upcoming observational data, 
where the sparseness of the data will be reduced more effectively 
than the observational distance errors compared to present obser- 
vations. The next lower mass cut mock D30_10 with 10% distance 



errors but 898 radial velocities instead of 588 gives a better recon- 
struction than the mock that keeps the 588 points but has no dis- 
tance errors at all. This is remarkable because an rms error of 10% 
in distance at a distance of 30 Mpc//i leads to a rms error of 300 
km/s on the radial velocities, and because of the Gaussian distance 
error distribution, some velocities in the mocks have error bars up 
to 100% and higher. It demonstrates that due to their coherence on 
large scales, peculiar velocities are an excellent input data source 
for the WF despite the large errors; even a few coherent data points 
with 100% errors and separated by a few Mpc//? represent a strong 
measurement of the local velocity field. 

Increasing the data volume (groups III and IV) has a much 
lesser effect on the reconstruction quality within the inner volume 
of 30 Mpc//i radius. This is expected for peculiar velocity data as 
opposed to redshift data, where a larger data volume would lead to 
a significantly better overall result. If we increase the total volume 
of the mock catalogue out to a distance of 60 Mpc//i, the improve- 
ment on the reconstruction inside the 30 Mpc/h is minimal. There 
is some effect for the high 11.9 mass cut mocks, since the addi- 
tional information partly compensates for the sparse samling. But 
for the low mass cut mocks at 1 1.5 there is no significant improve- 
ment, although the E60_10 mock contains already 7637 data points 
in total. This reflects a known favourable property of the WF; it 
successfully reconstructs the tidal component of the velocity and 
displacement fields, i.e. the part t hat is induced l3v the mass dis- 
tribution outside the data zone (e.g. lCourtois et al .l2012h . This also 
includes the dipole term, i.e. the bulk motion of the data volume due 
to the external field, which in our case is significant^- Adding more 



^ This external field corresponds to the large-scale Local Flow in that is 
observable in the Local Universe, see lCourtois et al.l l l2012h . 
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detail on this outside field does not significantly change the tidal 
component on the inner volume. The RZA reconstruction with WF 
can therefore work with small data volumes compared to density- 
based methods (i.e. based on redshift data). Any such method is 
by design not able to reconstruct the tidal component: information 
outside of the data zone cannot be inferred from galaxy positions 
alone. A density-based Lagrangian reconstruction from a catalogue 
volume of only 30 Mpc//i would therefore be of little use. On the 
other hand, Lagrangian reconstruction from peculiar velocities are 
an ideal tool to study the tidal flows on large scales. 

3.1.2 Selection criteria and connection to distance measurement 
methods 

Panel V in Figure [3] compares mocks created from different data 
selection criteria while keeping the number of data points con- 
stant at 588 inside 30 Mpc//i. The mass cut C, which picks the 
588 most massive objects, is compared against selecting lower- 
mass objects (L), the most isolated (I) and randomly picked (R) 
objects. It is very interesting to observe here that any of the al- 
ternative selection methods works better than selecting by mass. 
Mass selection is biased towards a higher sampling the overdense 
regions and a poorer sampling of the less dense regions. Any other 
selection will sample the less dense regions more completely and 
therefore conserve more information about the large-scale modes 
of the cosmic matter distribution. Additionally, a more homoge- 
neous sampling not biased towards the denser regions will be less 
affected by the non-linear enhancement bias of the pe culiar veloc- 
ity field dSheth & Diaferidl200ll ; iHamana et ai]|2003h . Therefore, 
such a sampling creates a WF solution that is better constrained 
and has a lower rms error. This detail is interesting in the context 
of using different observational distance measurement methods to 
obtain input data. 

The interpretation is that a galaxy sample more evenly dis- 
tributed around the sky and sufficiently probing less dense regions 
would lead to better reconstrutions than a galaxy sample prefer- 
entially located in dense regions. It suggests that galaxy distance 
samples of spiral galaxies (such as those derived from the Tully- 
Fisher relation) may be more ideal for RZA reconstruction than 
those primarily containing early types, which are biased towards 
massive sun'ounding haloes and dense environments. The randomly 
selected mocks (R) mimic the observational data of spiral galaxy 
peculiar velocities which are not located at the highest density 
peaks of the galaxy distribution, but are selected on the random 
basis of their inclination on the sky being grater than 45 degrees 
(this is an absolute requirement for the Tully-Fisher method). Sim- 
ilarly, the lowest-mass-selected mocks (L) mimic preferring less 
massive spiral galaxies over more massive elliptical galaxies, and 
the isolation-selected mocks (I) mimic preferring galaxies in less 
dense environments. All three selection criteria lead to a similar re- 
sult and provide a better reconstruction than focussing on the most 
massive objects. 

3.1.3 Radial vs. 3D data 

Group VI addresses the question how much of the information is 
missed due to the fact that only radial components of the velocity 
are observable. For this sake, we constructed the C30_3D mock, 
pretending that the full 3D velocity vector would be accessible in 
some way. It has three-dimensional velocity data on the usual 588 
haloes inside 30 Mpc/A. In C30_3D, there are no added errors be- 
cause it is not yet demonstrated with observational data, how one 



should derive mock 3D velocity errors from a distance error. This 
will happen in the f uture through direct astrometric measurements. 
jNusser et al1l2012l) gives an estimate of this error. Naively, it can 
be expected that the en^ors scales linearly with distance: the veloc- 
ity is proportional to the product of the angular distance and the 
angle between two repeated observations of the same object. An 
error on the angle yields an error on the velocity proportional to 
the distance. There is also a component of the error introduced by 
an error on the distance: but this one is also typically proportional 
to the distance (as indicated in this paper). We therefore compare 
it to C30_00, which has no errors either. Surprisingly, for the dis- 
placement reconstruction, the additional 3D information has much 
less effect than we would have expected, because the total scatter 
is dominated by the RZA error. The WF filtering bias (/? < 1) is 
almost completely removed by adding three-dimensional informa- 
tion to the data, but the rms eiTor on the displacement components 
is reduced only from 4.0 to 3.7 Mpc//?. This means that even for 
very high-quality data, the RZA estimate of the displacement ij/ is 
still significantly limited in precision by the disagreement between 
the large-scale velocity field and ij/ due to non-linear motions that 
do not follow the Zeldovich approximation. 



3.2 RZA error distribution 

3.2.1 General 

To obtain a more detailed view at the reconstruction quality, we 
consider the displacement error iP^'^ for the haloes inside the 
mock volume. Figure|4]shows the distribution of d^'^ for different 
mocks. Recall that d^^^ is computed for all haloes inside the mock 
volume, regardless of whether they were part of the particular mock 
from which a reconstruction was computed, so that we can directly 
compare the overall reconstruction quality from all mocks. 

Compared to the distribution of halo position error without 
RZA reconstruction (black dashed line), the RZA gives a signifi- 
cant improvement. While half of the haloes have an Xi„i, position 
error above 1 1 Mpc//? without RZA, this is the case for only around 
10% of the haloes with RZA reconstruction. Used as constraints for 
initial conditions, this leads to much more exact constrained sim- 
ulations (we refer the reader to the upcoming Paper III for more 
details). The distribution of course depends on the quality of the 
mocks. Typically, the reconstructions from radial velocities have 
median d^^'^ values around 5 Mpc//? and a skewed d^'^'^ distribu- 
tion. For comparison we plot also the "ideal" RZA (solid black), 
directly using the exact known 3D halo velocities « at z = to es- 
timate iji as v/Hof. As already mentioned in Paper I, this gives a 
median d^'^ of 2.8 Mpc/h inside the mock volume and a similarly 
skewed d^^ distribution. The mock groups I and II are represented 
in red lines in the left and right panels of Figure |4] respectively. 

The trend of degrading quality with increasing distance errors 
and decreasing number of datapoints is repeated here. However, 
the differences between the different d^'^^ distributions do not seem 
very huge considering that the respective quality of the mocks dif- 
fer from each other considerably: the distance errors are varied be- 
tween none and 20%, and the amount of datapoints inside the same 
volume between 282 and 1243. Comparing the mass cuts in the 
right panel, there is a hint that the reconstruction quality starts to 
saturate: the difference between the D30_10 and E30_10 catalogues 
is relatively small. Indeed, if the data quality increases, the over- 
all reconstruction error is more and more dominated by the RZA 
error The RZA reconstruction quality would thus not significantly 
increase if we add data mapping scales below the d^'^ scale. This 
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Figure 4. Probability distribution function (pdf) of the RZA displacement error tf^^ for different mock reconstructions, as well as a theoretical reconstruction 
using the exact 3D halo velocity of all haloes within the data zone, i.e. d = \il/ - v/Hof \ (black solid), and no reconstruction at all, i.e. d = (black dashed). 
The symbols are placed at the median of each distiibution. Each pdf was convolved with a 0.5 Mpc//i Gaussian kernel to obtain a smoother plot. 
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Figure 5. RZA displacement error d^'^ of haloes inside the data zone binned over the underlying density and the halo velocity for different mock reconstruc- 
tions, as well as the ideal RZA (solid black) and no RZA (dashed black). The number of haloes in each bin is also given. 



critical scale, below which the RZA reconstruction breaks down, 
depends on the local density and is 3.5 Mpc//? per cartesian compo- 
nent on average. Therefore, about one constraint per (3.5 Mpc//i)^^ 
on average would provide a sufficient data density. 

The blue line in the right panel uses the I30_10 mock, with 
588 points like the C30_10 mock (red long dashed) but more evenly 
distributed, which shows a noticeable improvement in the cP^^ dis- 
tribution. The green line in the left panel uses the three-dimensional 
data of C30_3D. The distribution is much closer to the "ideal RZA" 
than all the radial velocity mocks. We can derive from this that for 
RZA reconstruction, actually more accuracy is lost by having only 
the radial component than suggested by comparing the rms errors 
on i// (section [3.1.3l l. 

Figure |5] is conceptually similar to Figure 4 in Paper I, but 
comparing different reconstructions against each other. The cf^'^ 



error is binned against the underlying density and the absolute ve- 
locity. The fiducial C30_10 mock (red) is contrasted with the I30_00 
mock having a more uniform data distribution (green), the E30_10 
mock having about twice the amount of data points (blue), and the 
C30_00 mock having no data errors (purple). "Ideal" RZA (black) 
and no RZA (dashed black) is included for comparison. One can 
again see the main result: no matter how the data quality is in detail, 
the RZA reconstruction greatly reduces the distance errors for most 
of the objects in the data compared with no Lagrangian reconstruc- 
tion of their initial positions (dashed black). The left panel of Figure 
[5]highlights again the strong dependance of d^'^ on the underlying 
density. For the displacement field in underdense regions, it seems 
particularly important to have a more homogeneous sampling of the 
data, mapping this region well and with a distance error as low as 
possible: the C30_10 mock performs noticeably worse here than 
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Figure 6. RZA displacement error of lialoes inside ttie data zone binned over tlie peculiar velocity viewing angle y, i.e. the angle between the full 3D 
velocity of the halo v and its observed radial component. The C30_3D reconstruction utilises full three-dimensional data and thus does not depend on y, as 
well as the ideal RZA and no RZA. 



both the more homogeneous isolated halo mock I30_10 and the 
mock C30_00 with no errors. Conversely, in high-density regions, 
^RZA high and errors on the individual velocities have little influ- 
ence, since those velocities do not track the displacement field well 
due to the high non-linearity. In this case, it seems better to have 
more data points in general: the E30_10 mock performs somewhat 
better than the others. Fed with more data, the WF algorithm has 
more chance to filter out the noise. It is interesing that in the dens- 
est bin (around 1.5 smoothed density), the WF result from E30_I0 
performs even better than the theroretical "ideal" RZA. This can 
be understood from the properties of WF as a linear filter. The true 
halo velocities, even when perfectly known, are a bad tracer of the 
displacement field in the densest regions. But if the surrounding 
field is mapped sufficiently well, the WF can create a solution for 
the displacement field that is closer to the more linear actual dis- 
placements. The difference is very small though. We can argue that 
when there is a subsequent step of Wiener filtering on the velocity 
data, rigorously identifying and removing substructure is less crit- 
ical than it appeared in Paper I. More important is the insight that, 
since reconstruction quality from the different mocks does not dif- 
fer as much as one could expect, the procedure of Wiener filtering 
and subsequent RZA reconstruction is restricted more by the limi- 
tation of the scheme itself because of the underlying non-linearity 
of the system, and to a lesser degree by the actual data quality. 



3.2.2 Viewing angle 

It is worth to analyse in more detail the impact of having only 
constraints on the radial velocity component. The d'^'^ distribution 
function revealed that this fundamental limitation for RZA may be 
more significant than suggested by just the overall reconstruction 
rms error. Let us consider the "viewing angle" y of a single pecu- 
liar velocity observation, 

7 = acos — — , (7) 
\M • \r\l 



which is the angle between the full three-dimensional velocity vec- 
tor of an object and its observed radial component. At y = 0°, v,. 
will have the complete information on an object's peculiar velocity, 
while at 7 = 90°, the object's motion will be completely obscured. 
The majority of objects in the 30 Mpc//i mock volumes have, in 
this sense, "unfavourable" viewing angles: y > 45° for 67% of the 
haloes. We can individually compare the reconstruction quality for 
if/^^ for objects depending on their viewing angle to quantify its 
impact. 

Figure|6]shows a binning of d^'^ over y. Along with the mod- 
els in Figure |5] it also includes the C30_3D mock reconstruction, 
which has the same datapoints as C30_00, but with all three com- 
ponents, and is therefore unbiased with respect to y. In compari- 
son with this, for the radial velocity reconstructions (coloured bars) 
there is a tendency of higher d^'^^ with increasing y. This is ex- 
pected: it is more difficult for the WF to reconstruct the displace- 
ment field at a position where the data has a high y. It is particularly 
instructive to compare the 30° - 45° bin with the 75° - 90° bin. In 
both, the unbiased ideal RZA and C30_3D have similar values, but 
the radial velocity reconstructions (coloured) show a higher d^'^ 
scatter in the 75° - 90° bin. This additional scatter is however only 
at the order of 1 - 2 Mpc//?. This means that the WF yields a rea- 
sonable reconstruction for i//'^'^ even for datapoints at unfavourable 
viewing angles. 

We can see the net effect of radial vs. 3D data by directly com- 
paring C30_3D to C30_00 (purple). Then, the effect of having only 
radial data can be decomposed into an additional local error at the 
most affected positions (high y), causing some of the d^^ values 
to be significantly higher (more skewed d^'^ distribution), and an 
error affecting the reconstruction as a whole by increasing d^'^ 
by 1 - 1.5 Mpc//z on average. These errors are remarkably small 
compared to how much information on the full velocity vector is 
obscured by the radial limitation. Therefore we can state that the 
WF + RZA procedure performs very well on radial velocity data. 
Most importantly, from Figure|6]it is clear that the radial limitation 
is not the dominating error source of RZA reconstruction. 
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4 SUMMARY AND DISCUSSION 

In this paper, we investigated the Reverse Zeldovich Approxima- 
tion (RZA), a Lagrangian reconstruction method to trace a galaxy 
peculiar velocity dataset back in time to early redshifts in the linear 
regime, after which the data can be used to constrain cosmological 
initial conditions and to run constrained simulations of the Local 
Universe. In particular, we tested how the reconstruction quality of 
the RZA method is affected by observational errors and limitations 
present in such datasets. For this, we applied the method to a set 
of mock peculiar velocity catalogues extracted from a cosmologi- 
cal simulation. We investigated the influence of distance measure- 
ment errors, data sparseness, limited data volume, different object 
selection methods, and the effect of being limited to only one of 
three cartesian components of the galaxy peculiar velocity vector, 
by varying all these effects in the different mock catalogues. We 
then used the true initial conditions of the reference simulations 
to determine how well the cosmological displacements if/ and the 
initial positions Xi^it can be recovered with RZA. 

With our results we can confirm that RZA significantly im- 
proves the reconstruction quality compared to the previous method 
not using Lagrangian reconstruction. In our sample, RZA reduces 
the initial position eiTors from 1 1 Mpc//? to around 4-5 Mpc//i 
for a realistic mock data quality. Aside from this, our main con- 
clusion is that the accuracy of the RZA reconstruction is limited by 
the inherent non-linearity of the velocity field at z = 0. The effect of 
this non-linearity is always stronger than the combined effect of ob- 
servational errors even for poor quality datasets. This non-linearity 
manifests itself in the fact that the displacement field if/ and the pe- 
culiar velocity field v diverge from their linear-theory interrelation 
on a scale of 3.7 Mpc//i on average in our sample, which sets a hard 
limit for possible Lagrangian reconstruction from peculiar veloci- 
ties. This limit also highly depends on the local density. 

We therefore conclude that observational errors present in pe- 
culiar velocity datasets do not present a major obstacle for applying 
RZA reconstruction. Even an extremely sparse dataset with high 
observational errors still leads to a good reconstruction of the ini- 
tial conditions with the median error on the initial positions being 
» 5 Mpc//i. That said, a significant increase in reconstruction qual- 
ity can be obtained by increasing the number density of data points, 
i.e. by working with less sparse data. The observational distance 
errors and the radial-component limitation have a lesser influence. 
One can compensate very well for the uncertainties they introduce 
by applying a Wiener Filter reconstruction to the data prior to RZA. 
We surprisingly find that not knowing two of the three velocity 
components introduces an error of only 1-1.5 Mpc//? on average. 

We also find that a better reconstruction is obtained with a 
more homogeneous sample of data points, providing a more com- 
plete mapping of the volume, and the reconstruction quality wors- 
ens if the data points are biased towards the most massive objects 
and therefore preferentially located in high-density regions. This 
translates to the statement that observational distance measurement 
methods selecting galaxies in a more random fashion and not bi- 
ased towards high-density environments provide ideal input data 
for RZA. This is the case for peculiar velocities obtained with the 
Tully-Fisher method, which selects spiral galaxies on the random 
basis of their inclination on the sky being grater than 45 degrees, 
and ignores elliptical galaxies that are strongly biased towards high- 
density regions. We therefore argue that Tully-Fisher data is best 
suited for RZA reconstruction compared to peculiar velocity data 
obtained from other distance measurement methods. 

In an upcoming work (Paper III of the series on RZA) we will 



present constrained simulations run from initial conditions obtained 
with RZA reconstruction, and analyze their accuracy. Ultimately, 
the goal of our work is to apply the method directly to the newest 
observational data to obtain constrained simulations of the Local 
Universe. With the findings of the analysis presented here, we can 
expect that the accuracy in which they will reproduce the large- 
scale structure of the Local Universe will be much higher than that 
of previous constrained simulations. 
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